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Abstract 

Background: Domestic cats enjoy an extensive veterinary medical surveillance which has described nearly 250 
genetic diseases analogous to human disorders. Feline infectious agents offer powerful natural models of deadly 
human diseases, which include feline immunodeficiency virus, feline sarcoma virus and feline leukemia virus. A rich 
veterinary literature of feline disease pathogenesis and the demonstration of a highly conserved ancestral mammal 
genome organization make the cat genome annotation a highly informative resource that facilitates multifaceted 
research endeavors. 

Findings: Here we report a preliminary annotation of the whole genome sequence of Cinnamon, a domestic cat 
living in Columbia (MO, USA), bisulfite sequencing of Boris, a male cat from St. Petersburg (Russia), and light 30x 
sequencing of Sylvester, a European wildcat progenitor of cat domestication. The annotation includes 21,865 
protein-coding genes identified by a comparative approach, 21 7 loci of endogenous retrovirus-like elements, 
repetitive elements which comprise about 55.7% of the whole genome, 99,494 new SNVs, 8,355 new indels, 743,326 
evolutionary constrained elements, and 3,1 82 microRNA homologues. The methylation sites study shows that 1 0.5% 
of cat genome cytosines are methylated. An assisted assembly of a European wildcat, Felis silvestris silvestris, was 
performed; variants between F. silvestris and F. catus genomes were derived and compared to F. catus. 

Conclusions: The presented genome annotation extends beyond earlier ones by closing gaps of sequence that 
were unavoidable with previous low-coverage shotgun genome sequencing. The assembly and its annotation offer 
an important resource for connecting the rich veterinary and natural history of cats to genome discovery. 

Keywords: Felis catus, Domestic cat, Felis silvestris silvestris, European wildcat. Genome sequence. Annotation, 
Assembly 



Data description 

The genome of a female Abyssinian cat ("Cinnamon" who 
resides at the University of Missouri-Columbia, USA) 
was sequenced at 1.8 x and 3.0 x whole genome shot- 
gun (WGS) coverage at Agencourt Inc. Fca-6.2, an addi- 
tional 12 X coverage of 454 reads and BAC ends was 
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sequenced, assembled with CABOG [1] and analysed at 
Washington University, St. Louis (USA) [2]. Fca-6.2 is 
anchored to chromosome coordinates with two physical 
framework maps, a radiation hybrid map [3] and a short 
tandem repeat (STR) linkage map [4]. Further, 1943 dis- 
tinct sites identified in a recently built linkage map using 
a single nucleotide polymorphism (SNP) genotyping array 
including ^ 60,000 SNPs from an Illumina custom cat 
genotyping array are also mapped to the assembly. 

Here we present a genome browser. Genome Anno- 
tation Resource Fields — GARfield [5], which displays 
the Fca-6.2 assembly and included annotated genome 
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Table 1 Annotated cat genome features available as 
genome browser tracks for GARfleld and UCSC genome 
browsers 

Feature Additional fife 1 



I. Assembly of Felis catus genome 
Fca-6.2 

II. Gene annotation 

III. Domestic cat DNA variants 

IV. Repeats content 

V. Nuclear mitochondrial (Numt) 
pseudo gene fragments 

VI. Evolutionary constrained 
elements (ECE) 

VII. Feline endogenous retrovirus- 
like elements 

VIII. Methylation sites 

IX. MicroRNA 

X. Variants between F. silvestris and 
F. catus. 



Tables SI -S7 

Tables S8, S9; Figures S2, S3 
Tables SI 0-Sl 6; Figures S4-S1 3 
Figure S14 

Tables SI 7, SI 8 

Table SI 9; Figure SI 8 

Table S20 
Table S21 



features. In Table 1 we list the features of GARfield anno- 
tated in the cat genome assembly which are described and 
illustrated in the Additional file 1 of this Data Note. The 
genome features detected in Fca-6.2 include a merged list 
of 21,865 genes derived from a comparative gene identi- 
fication strategy using BLAST alignments between gene 
exons of reference genome from eight reference mam- 
malian gene maps (human, chimpanzee, macaque, dog, 
cow, horse, rat, and mouse) obtained from the Ensembl 
Gene 75 database [6]. In addition, the whole genome 
methylation sites and a methylome bisulfite sequence pat- 
tern of cat whole blood cells is presented, previewing 
epigenetic profiling in important complex disease associa- 
tions, including diseases with viral and neoplastic etiology. 

Approximately 55.7% of the cat genome is composed 
of repetitive elements of familiar classes (LINEs, SINEs, 
satellite DNA, LTRs and others). We report more than 
25 novel families of complex tandem repeat elements 
in the cat genome uncovered by multiple repeat detec- 
tion algorithms. We searched for STR-microsatellite loci 
useful in population and forensic applications. Puta- 
tive PGR primers for 53,710 STR loci are annotated. 
We also mapped known feline endogenous retrovi- 
ral loci (full length RD114, FeLV, EERY) and detected 
125 kb of partial retroviral genome sequences dispersed 
across the cat genome. Nuclear mitochondrial (Numt) 
DNA pseudogenes derived from ancient transposition 
from cytoplasmic mitochondrial chromosomes to nuclear 
chromosomal positions comprise 176 kb in addition 
to the Lopez-Numt, a 7.8 kb element tandem-repeated 
38-76 times on Ghromosome D2 previously described in 
the 1.8 X analysis of Ginnamons genome [7]. 



The earlier 3,078,438 feline single nucleotide variants 
(SNVs) [7,8] from largely non-repetitive regions of the cat 
genome are supplemented with a new group of 99,494 
newly annotated SNPs plus 8,355 detected indels. In addi- 
tion, we performed an assisted assembly with a 40x 
Illumina SOLID DNA sequence coverage of Sylvester, a 
European wildcat, E silvestris silvestris, a wild represen- 
tative of the species from which cats were domesticated 
approximately 10,000 years ago [9]. Genome variations 
(SNVs and indels) between E silvestris and E catus SNPs 
are reported here and both species' genomes and their 
associated data have been uploaded to the GARfield 
genome browser (see Availability of supporting data 
section). 

Our annotation resolved cat homologues of 743,362 
evolutionarily constrained elements (EGEs) recently 
identified in the human genome by alignment to 29 dif- 
ferent mammalian genomes [10] and these were com- 
pared to the conserved sequence blocks obtained by the 
reciprocal best match (RBM) screen for cat genes with 
seven mammalian genomes (human, chimp, macaque, 
dog, cow, rat and mouse). A conservative alignment 
approach implicated 54% of the human EGE sequence 
comprising ^ 3% of the cat genome. A total of 3,182 
feline microRNA (miRNA) homologues were detected 
and mapped based upon homology to miRNA sequences 
from 36 species with miRNA sequence described in the 
miRBase database [11]. Finally we screened the genome 
sequence for copy number variation and segmental dupli- 
cations. All annotated features listed in Table 1 are 
described in detail in Additional file 1 and tracked in the 
GARfield genome browser. 

Availability of supporting data 

The assembly sequences are available at NGBI Ref- 
Seq database (accession numbers #PRJNA175699 and 
#PRJNA253950). The annotated features are available 
in the Genome Association Resource Fields (GARfield) 
genome browser http://garfield.dobzhanskycenter.org and 
the UCSC Genome Browser (http://genome.ucsc.edu), 
which links to a Dobzhansky Center Hub (http:// 
public.dobzhanskycenter.ru/Hub/hub.txt) (See Section 2 
of Additional file 1 for instructions). Supplementary tables 
and figures that refer to GARfield features are given in 
Additional file 1 and listed in Table 1. 

Sequence and variation data is available in NCBI 
(SAMN02795853 for Boris the cat and SAMN02898152 
for wildcat) and supporting data is also available in the 
GigaDB repository [12]. 

Additional file 



Additional file 1 : Supplementary materials. 
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